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BITSTREAM-CONTROLLED POST-PROCESSING FILTERING 



CROSS REFERENCE TO RELATED APPLICATION 

This application claims the benefit of U.S. Provisional Patent Application Serial 
5 No. 60/501,081, entitled "VIDEO ENCODING AND DECODING TOOLS AND 
TECHNIQUES," filed September 7, 2003, the disclosure of which is incorporated 
herein by reference. 



TECHNICAL FIELD 

10 Techniques and tools for bitstream-controUed filtering are described. For 

example, a video encoder provides control information for post-processing filtering, and 
a video decoder performs bitstream-controlled post-processing filtering with a de- 
ringing and/or de-blocking filter. 



15 BACKGROUND 

Digital video consumes large amounts of storage and transmission capacity. A 
typical raw digital video sequence includes 15 or 30 fi'ames per second. Each fi^ame can 
include tens or hundreds of thousands of pixels (also called pels). Each pixel represents 
a tiny element of the picture. In raw form, a computer commonly represents a pixel 
20 with 24 bits. Thus, the niunber of bits per second, or bitrate, of a typical raw digital 
video sequence can be 5 million bits/second or more. 

Most computers and computer networks lack the resources to process raw digital 
video. For this reason, engineers use compression (also called coding or encoding) to 
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reduce the bitrate of digital video. Compression can be lossless, in which quality of the 
video does not suffer but decreases in bitrate are limited by the complexity of the video. 
Or, compression can be lossy, in which quality of the video suffers but decreases in 
bitrate are more dramatic. Decompression reverses compression. 
5 In general, video compression techniques include intraframe compression and 

interframe compression. Intraframe compression techniques compress individual 
frames, typically called I-frames or key frames. Interframe compression techniques 
compress frames with reference to preceding and/or following frames, which are 
typically called predicted frames, P-frames, or B-frames. 

10 Microsoft Corporation's Windows Media Video Versions 8 ["WMV8"] and 9 

["WMV9"] each include a video encoder and a video decoder. The encoders use 
intraframe and interframe compression, and the decoders use intraframe and interframe 
decompression. There are also several intemational standards for video compression 
and decompression, including the Motion Picture Experts Group ["MPEG"] 1, 2, and 4 

1 5 standards and the H.26x standards. Like WMV8 and WMV9, these standards use a 
combination of intraframe and interframe compression and decompression. 

I. Block-based Intraframe Compression and Decompression 

Many prior art encoders use block-based intraframe compression. To illustrate, 
20 suppose an encoder splits a video frame into 8x8 blocks of pixels and applies an 8x8 
Discrete Cosine Transform ["DCT"] to individual blocks. The DCT converts a given 
8x8 block of pixels (spatial information) into an 8x8 block of DCT coefficients 
(frequency information). The DCT operation itself is lossless or nearly lossless. The 
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encoder quantizes the DCT coefficients, resulting in an 8x8 block of quantized DCT 
coefficients. Quantization is lossy, resulting in loss of precision, if not complete loss of 
the information for the coefficients. The encoder then prepares the 8x8 block of 
quantized DCT coefficients for entropy encoding and performs the entropy encoding, 
5 which is a form of lossless compression. 

A corresponding decoder performs a corresponding decoding process. For a 
given block, the decoder performs entropy decoding, inverse quantization, an inverse 
DCT, etc., resulting in a reconstructed block. Due to the quantization, the reconstructed 
block is not identical to the original block. In fact, there may be perceptible errors 
10 within reconstructed blocks or at the boundaries between reconstructed blocks. 

II. Block-based Interframe Compression and Decompression 

Many prior art encoders use block-based motion-compensated prediction coding 
followed by transform coding of residuals. To illustrate, suppose an encoder splits a 

15 predicted frame into 8x8 blocks of pixels. Groups of four 8x8 luminance blocks and 
two co-located 8x8 chrominance blocks form macroblocks. Motion estimation 
approximates the motion of the macroblock relative to a reference frame, for example, a 
previously coded, preceding frame. The encoder computes a motion vector for the 
macroblock. In motion compensation, the motion vector is used to compute a 

20 prediction macroblock for the macroblock using information from the reference frame. 
The prediction is rarely perfect, so the encoder usually encodes blocks of pixel 
differences (also called the error or residual blocks) between the prediction and the 
original macroblock. The encoder applies a DCT to the error blocks, resulting in blocks 
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of coeflBcients. The encoder quantizes the DCT coefScients, prepares the blocks of 
quantized DCT coefficients for entropy encoding, and performs the entropy encoding. 

A corresponding decoder performs a corresponding decoding process. The 
decoder performs entropy decoding, inverse quantization, an inverse DCT, etc., 
5 resulting in reconstructed error blocks. In a separate motion compensation path, the 
decoder computes a prediction using motion vector information relative to a reference 
frame. The decoder combines the prediction with the reconstructed error blocks. 
Again, the reconstructed video is not identical to the corresponding original, and there 
may be perceptible errors within reconstructed blocks or at the boundaries between 
1 0 reconstructed blocks. 

III. Blocking Artifacts and Ringing Artifacts 

Lossy compression can result in noticeable errors in video after reconstruction. 
The heavier the lossy compression and the higher the quahty of the original video, the 

15 more Hkely it is for perceptible errors to be introduced in the reconstructed video. Two 
common kinds of errors are blocking artifacts and ringing artifacts. 

Block-based compression techniques have benefits such as ease of 
implementation, but introduce blocking artifacts, which are perhaps the most common 
and annoying type of distortion in digital video today. Blocking artifacts are visible 

20 discontinuities around the edges of blocks in reconstructed video. Quantization and 

truncation (e.g., of transform coefficients from a block-based transform) cause blocking 
artifacts, especially when the compression ratio is high. When blocks are quantized 
independently, for example, one block may be quantized less or more than an adjacent 
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block. Upon reconstruction, this can result in blocking artifacts at the boundary 
between the two blocks. Or, blocking artifacts may result when high-frequency 
coefficients are quantized, if the overall content of the blocks differs and the high- 
frequency coefficients are necessary to reconstruct transition detail across block 
5 boundaries. 

Ringing artifacts are caused by quantization or truncation of high-frequency 
transform coefficients, whether the transform coefficients are from a block-based 
transform or from a wavelet-based transform. Both such transforms essentially 
represent an area of pixels as a sum of regular waveforms, where the waveform 

10 coefficients are quantized, encoded, etc. In some cases, the contributions of high- 
frequency waveforms counter distortion introduced by a low-frequency waveform. If 
the high-frequency coefficients are heavily quantized, the distortion may become visible 
as a wave-like oscillation at the low frequency. For example, suppose an image area 
includes sharp edges or contours, and high-frequency coefficients are heavily quantized. 

15 In a reconstructed image, the quantization may cause ripples or oscillations around the 
sharp edges or contours. 

rv. Post-Processing Filtering 

Blocking artifacts and ringing artifacts can be reduced using de-blocking and de- 
20 ringing techniques. These techniques are generally referred to as post-processing 
techniques, since they are typically applied after video has been decoded. Post- 
processing usually enhances the perceived quality of reconstructed video. 
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The WMV8 and WMV9 decoders use specialized filters to reduce blocking and 
ringing artifacts during post-processing. For additional information, see Annex A of 
U.S. Provisional Patent Application Serial No. 60/341,674, filed December 17, 2001 
and Annex A of U.S. Provisional Patent Application Serial No. 60/488,710, filed July 
5 18, 2003. Similarly, software implementing several of the MPEG and H.26x standards 
mentioned above has de-blocking and/or de-ringing filters. For example, see (1) the 
MPEG-4 de-blocking and de-ringing filters as tested in the verification model and 
described in Annex F, Section 15.3 of MPEG-4 draft N2202, (2) the H.263+ post- 
processing filter as tested in the Test Model Near-term, and (3) the H.264 JM post- 

10 processing filter. In addition, numerous publications address post-processing filtering 
techniques (as well as corresponding pre-processing techniques, in some cases). For 
example, see (1) Kuo et al., "Adaptive Postprocessor for Block Encoded Images," IEEE 
Trans, on Circuits and Systems for Video Technology, Vol. 5, No. 4 (Aug. 1995), (2) 
O'Rourke et al., "Improved Image Decompression for Reduced Transform Coding 

15 Artifacts," IEEE Trans, on Circuits and Systems for Video Technology, Vol. 5, No. 6, 
(1995), and (3) Segall et al., "Pre- and Post-Processing Algorithms for Compressed 
Video Enhancement," Proc. 34^ Asilomar Conf on Signals and Systems (2000). 

Figure 1 is a generalized diagram of post-processing filtering according to the 
prior art. A video encoder (1 10) accepts source video (105), encodes it, and produces a 

20 video bitstream (115). The video bitstream (115) is delivered via a channel (120), for 
example, by transmission as streaming media over a network. A video decoder (130) 
receives and decodes the video bitstream (115), producing decoded video (135). A 
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post-processing filter (140) such as a de-ringing and/or de-blocking filter is used on the 
decoded video (135), producing decoded, post-processed video (145). 

Strictly speaking, post-processing filtering techniques are not needed to decode 
the video bitstream (1 15). Codec (enCOder/DECoder) engineers may decide whether to 
5 apply such techniques when designing a codec. The decision can depend, for example, 
on whether CPU cycles are available for a software decoder, or on the additional cost 
for a hardware decoder. Since post-processing filtering techniques usually enhance 
video quality significantly, they are commonly applied in most video decoders today. 
Post-processing filters are sometimes designed independently firom a video codec, so 

10 the same de-blocking and de-ringing filters may be applied to different codecs. 

In prior systems, post-processing filtering is applied automatically to an entire 
video sequence. The assumption is that post-processing filtering will always at least 
improve video quality, and thus post-processing filtering should always be on. From 
system to system, filters may have different strengths according to the capabilities of the 

15 decoder. Moreover, some filters selectively disable or change the strength of filtering 
depending on decoder-side evaluation of the content of reconstructed video, but this 
adaptive processing is still automatically performed. There are several problems with 
these approaches. 

First, the assumption that post-processing filtering always at least improves 
20 video quality is incorrect. For high quahty video that is compressed without much loss, 
post-processing de-blocking and de-ringing may eliminate texture details and noticeably 
blur video images, actually decreasing quality. This sometimes occurs for high 
definition video encoded at high bitrates. 
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Second, there is no information in the video bitstream that guides post- 
processing filtering. The author is not allowed to control or adapt post-processing 
filtering by introducing information in the video bitstream to control the filtering. 

5 V. In-Loop Filtering 

Aside fi-om post-processing filtering, several prior art systems use in-loop 
filtering. In-loop filtering involves filtering (e.g., de-blocking filtering) on 
reconstructed reference fi-ames during motion compensation in the encoding and 
decoding processes (whereas post-processing is applied after the decoding process). By 

10 reducing artifacts in reference fi-ames, the encoder and decoder improve the quality of 
motion-compensated prediction fi"om the reference fi-ames. For example, see (1) section 
4.4 of U.S. Provisional Patent Application Serial No. 60/341,674, filed December 17, 
2001, (2) section 4.9 of U.S. Provisional Patent Application Serial No. 60/488,710, filed 
July 18, 2003, (3) section 3.2.3 of the H.261 standard (which describes conditional low- 

15 pass filtering of macroblocks), (4) section 3.4.8 and Annex J of the H.263 standard, and 
(3) the relevant sections of the H.264 standard. 

In particular, the H.264 standard allows an author to turn in-loop filtering on and 
off, and even modify the strength of the filtering, on a scene-by-scene basis. The H.264 
standard does not, however, allow the author to adapt loop filtering for regions within a 

20 firame. Moreover, the H.264 standard applies only one kind of in-loop filter. 

Given the critical importance of video compression and decompression to digital 
video, it is not surprising that video compression and decompression are richly 
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developed fields. Whatever the benefits of previous video compression and 
decompression techniques, however, they do not have the advantages of the following 
techniques and tools. 

5 SUMMARY 

In summary, the detailed description is directed to various techniques and tools 
for bitstream-controUed filtering. For example, a video encoder puts control 
information into a bitstream for encoded video. A video decoder decodes the encoded 
video and, according to the control information, performs post-processing filtering on 

10 the decoded video. With this kind of control, a human operator can allow post- 
processing to the extent it enhances video quality and otherwise disable the post- 
processing. In one scenario, the operator controls post-processing filtering to prevent 
excessive blurring in reconstruction of high-definition, high bitrate video. 

The various techniques and tools can be used in combination or independently. 

15 In one aspect, a video encoder or other tool receives and encodes video data, and 

outputs the encoded video data as well as control information. The control information 
is for controlling post-processing filtering of the video data after decoding. The post- 
processing filtering includes de-blocking, de-ringing, and/or other kinds of filtering. 
Typically, a human operator specifies control information such as post-processing filter 

20 levels (i.e., filter strengths) or filter type selections. Depending on implementation, the 
control information is specified for a sequence, scene, firame, region within a fi-ame, 
and/or at some other level. 
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In another aspect, a video decoder or other tool receives encoded video data and 
control information, decodes the encoded video data, and performs post-processing 
filtering on the decoded video data based at least in part upon the received control 
information. Again, the post-processing filtering includes de-blocking, de-ringing, 
S and/or other kinds of filtering, and the control information is specified for a sequence, 
scene, frame, region within a firame, and/or at some other level, depending on 
implementation. 

Additional features and advantages will be made apparent fi"om the following 
detailed description of different embodiments that proceeds with reference to the 
1 0 accompanying drawings. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagram showing post-processing filtering according to the prior 

art. 

1 5 Figure 2 is a block diagram of a suitable computing environment. 

Figure 3 is a block diagram of a generalized video encoder system. 

Figure 4 is a block diagram of a generalized video decoder system. 

Figure 5 is a diagram showing bitstream-controUed post-processing filtering. 

Figure 6 is a flowchart showing a technique for producing a bitstream with 
20 embedded control information for post-processing filtering. 

Figure 7 is a flowchart showing a technique for performing bitstream-controUed 
post-processing filtering. 
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DETAILED DESCRIPTION 

The present application relates to techniques and tools for bitstream-controlled 
post-processing filtering for de-blocking and de-ringing reconstructed video. The 
techniques and tools give a human operator control over post-processing filtering, such 
5 that the operator can enable post-processing to the extent it enhances video quality and 
otherwise disable the post-processing. For example, the operator controls post- 
processing filtering to prevent excessive blurring in reconstruction of high-definition, 
high bitrate video. 

Among other things, the application relates to techniques and tools for 
10 specifying control information, parameterizing control information, signaling control 
information, and filtering according to control information. The various techniques and 
tools can be used in combination or independently. Different embodiments implement 
one or more of the described techniques and tools. 

While much of the detailed description relates directly to de-blocking and de- 
15 ringing filtering during post-processing, the techniques and tools may also be appUed at 
other stages (e.g., in-loop filtering in encoding and decoding) and for other kinds of 
filtering. 

Similarly, while much of the detailed description relates to video encoders and 
decoders, another type of video processing tool or other tool may implement one or 
20 more of the techniques for bitstream-controlled filtering. 
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I. Computing Environmeiit 

Figure 2 illustrates a generalized example of a suitable computing environment 
(200) in which several of the described embodiments may be implemented. The 
computing environment (200) is not intended to suggest any limitation as to scope of 
S use or functionality, as the techniques and tools may be implemented in diverse general- 
purpose or special-purpose computing environments. 

With reference to Figure 2, the computing environment (200) includes at least 
one processing unit (210) and memory (220). In Figure 2, this most basic configuration 
(230) is included within a dashed line. The processing xmit (210) executes computer- 

10 executable instructions and may be a real or a virtual processor. In a multi-processing 
system, multiple processing units execute computer-executable instructions to increase 
processing power. The memory (220) may be volatile memory (e.g., registers, cache, 
RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some 
combination of the two. The memory (220) stores software (280) implementing 

15 bitstream-controUed fiUering techniques for an encoder and/or decoder. 

A computing environment may have additional features. For example, the 
computing environment (200) includes storage (240), one or more input devices (250), 
one or more output devices (260), and one or more communication connections (270). 
An interconnection mechanism (not shown) such as a bus, controller, or network 

20 interconnects the components of the computing environment (200). Typically, 

operating system software (not shown) provides an operating environment for other 
software executing in the computing environment (200), and coordinates activities of 
the components of the computing environment (200). 



KBR/kbr 3382-66954 10/06/03 



-13- 



EXPRESS MAIL LABEL NO. EV 339201236 US 
DATE OF DEPOSIT: October 6. 2003 



The storage (240) may be removable or non-removable, and includes magnetic 
disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other mediimi which can 
be used to store information and which can be accessed within the computing 
environment (200). The storage (240) stores the software (280) implementing the 
5 bitstream-controUed filtering techniques for an encoder and/or decoder. 

The input device(s) (250) may be a touch input device such as a keyboard, 
mouse, pen, or trackball, a voice input device, a scanning device, or another device that 
provides input to the computing environment (200). For audio or video encoding, the 
input device(s) (250) may be a sound card, video card, TV tuner card, or similar device 
10 that accepts audio or video input in analog or digital form, or a CD-ROM or CD-RW 
that reads audio or video samples into the computing environment (200). The output 
device(s) (260) may be a display, printer, speaker, CD-writer, or another device that 
provides output fi"om the computing environment (200). 

The communication connection(s) (270) enable communication over a 
15 communication medium to another computing entity. The communication medium 
conveys information such as computer-executable instructions, audio or video input or 
output, or other data in a modulated data signal. A modulated data signal is a signal that 
has one or more of its characteristics set or changed in such a manner as to encode 
information in the signal. By way of example, and not limitation, communication 
20 media include wired or wireless techniques implemented with an electrical, optical, RF, 
infrared, or other carrier. 

The techniques and tools can be described in the general context of computer- 
readable media. Computer-readable media are any available media that can be accessed 
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within a computing environment. By way of example, and not limitation, with the 
computing environment (200), computer-readable media include memory (220), storage 
(240), commxmication media, and combinations of any of the above. 

The techniques and tools can be described in the general context of computer- 
5 executable instructions, such as those included in program modules, being executed in a 
computing environment on a target real or virtual processor. Generally, program 
modules include routines, programs, libraries, objects, classes, components, data 
structures, etc. that perform particular tasks or implement particular abstract data types. 
The functionality of the program modules may be combined or split between program 
10 modules as desired in various embodiments. Computer-executable instructions for 

program modules may be executed within a local or distributed computing environment. 



IL Generalized Video Encoder and Decoder 

Figure 3 is a block diagram of a generalized video encoder (300) and Figure 4 is 

15 a block diagram of a generalized video decoder (400). 

The relationships shown between modules within the encoder and decoder 
indicate the main flow of information in the encoder and decoder; other relationships 
are not shown for the sake of simpUcity. In particular. Figures 3 and 4 usually do not 
show side information indicating the encoder settings, modes, tables, etc. used for a 

20 video sequence, frame/field, macroblock, block, etc. Such side information is sent in 
the output bitstream, typically after entropy encoding of the side information. The 
format of the output bitstream can be Windows Media Video version 9 format or 
another format. 



KBRAbr 3382-66954 10/06/03 



-15- 



EXPRESS MAIL LABEL NO. EV 339201236 US 
DATE OF DEPOSIT: October 6, 2003 



The encoder (300) and decoder (400) are block-based and use a 4:2:0 
macroblock format with each macroblock including 4 luminance 8x8 luminance blocks 
(at times treated as one 16x16 macroblock) and two 8x8 chrominance blocks. The 
encoder (300) and decoder (400) operate on video pictures, which are video frames 
5 and/or video fields. Alternatively, the encoder (300) and decoder (400) are object- 
based, use a different macroblock or block format, or perform operations on sets of 
pixels of different size or configuration than 8x8 blocks and 16x16 macroblocks. 

Depending on implementation and the type of compression desired, modules of 
the encoder or decoder can be added, omitted, split into multiple modules, combined 
10 with other modules, and/or replaced with like modules. In alternative embodiments, 
encoder or decoders with different modules and/or other configurations of modules 
perform one or more of the described techniques. 

A. Video Encoder 

15 Figure 3 is a block diagram of a general video encoder system (300). The 

encoder system (300) receives a sequence of video pictures including a current picture 
(305), and produces compressed video information (395) as output. Particular 
embodiments of video encoders typically use a variation or supplemented version of the 
generaHzed encoder (300). 

20 The encoder system (300) compresses predicted pictures and key pictures. For 

the sake of presentation. Figure 3 shows a path for key pictures through the encoder 
system (300) and a path for forward-predicted pictures. Many of the components of the 
encoder system (300) are used for compressing both key pictures and predicted pictures. 
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The exact operations perfomied by those components can vary depending on the type of 
information being compressed. 

A predicted picture (also called p-picture, b-picture for bi-directional prediction, 
or inter-coded picture) is represented in terms of prediction (or difference) from one or 
5 more other pictures. A prediction residual is the difference between what was predicted 
and the original picture. In contrast, a key picture (also called i-picture, intra-coded 
picture) is compressed without reference to other pictures. 

If the current picture (305) is a forward-predicted picture, a motion estimator 
(310) estimates motion of macroblocks or other sets of pixels of the current picture 

10 (305) with respect to a reference picture (325), which is the reconstructed previous 
picture buffered in the picture store (320). In alternative embodiments, the reference 
picture is a later picture or the current picture is bi-directionally predicted. The motion 
estimator (310) outputs as side information motion information (315) such as motion 
vectors. A motion compensator (330) appUes the motion information (315) to the 

15 reference picture (325) to form a motion-compensated current picture prediction (335). 
The prediction is rarely perfect, however, and the difference between the motion- 
compensated current picture prediction (335) and the original current picture (305) is 
the prediction residual (345). Alternatively, a motion estimator and motion 
compensator apply another type of motion estimation/compensation. 

20 A frequency transformer (360) converts spatial domain video information into 

frequency domain (i.e., spectral) data. For block-based video pictures, the frequency 
transformer (360) applies DCT or variant of DCT to blocks of the pixel data or 
prediction residual data, producing blocks of DCT coefficients. Alternatively, the 
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frequency transformer (360) applies another conventional frequency transform such as a 
Fourier transform or uses wavelet or subband analysis. In some embodiments, the 
frequency transformer (360) applies an 8x8, 8x4, 4x8, or other size frequency transform 
(e.g., DCT) to prediction residuals for predicted pictures. 
5 A quantizer (370) then quantizes the blocks of spectral data coefBcients. The 

quantizer applies uniform, scalar quantization to the spectral data with a step-size that 
varies on a picture-by-picture basis or other basis. Alternatively, the quantizer applies 
another type of quantization to the spectral data coefficients, for example, a non- 
uniform, vector, or non-adaptive quantization, or directly quantizes spatial domain data 

10 in an encoder system that does not use frequency transformations. 

When a reconstructed current picture is needed for subsequent motion 
estimation/compensation, an inverse quantizer (376) performs inverse quantization on 
the quantized spectral data coefficients. An inverse frequency transformer (366) then 
performs the inverse of the operations of the frequency transformer (360), producing a 

1 5 reconstructed prediction residual or reconstructed key picture data. If the current 
picture (305) was a key picture, the reconstracted key picture is taken as the 
reconstructed current picture (not shown). If the current picture (305) was a predicted 
picture, the reconstructed prediction residual is added to the motion-compensated 
current picture prediction (335) to form the reconstructed current pictxire. The picture 

20 store (320) buffers the reconstructed current picture for use in predicting the next 

picture. In some embodiments, the encoder (300) appUes an in-loop de-blocking filter 
to the reconstructed picture to adaptively smooth discontinuities at block boundaries in 
the picture. For additional detail, see U.S. Patent Application Serial No. 10/322,383, 
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filed December 17, 2002, and U.S. Patent Application Serial No. 10/623,128, filed July 
18, 2003, the disclosures of which are hereby incorporated by reference. 

The entropy coder (380) compresses the output of the quantizer (370) as well as 
certain side information. Typical entropy coding techniques include arithmetic coding, 
5 differential coding, Huffman coding, run length coding, LZ coding, dictionary coding, 
and combinations of the above. The entropy coder (380) typically uses different coding 
techniques for different kinds of information, and can choose fi^om among multiple code 
tables within a particular coding technique. 

The entropy coder (380) puts compressed video information (395) in the buffer 
10 (390). A buffer level indicator is fed back to bitrate adaptive modules. The compressed 
video information (395) is depleted firom the buffer (390) at a constant or relatively 
constant bitrate and stored for subsequent streaming at that bitrate. Or, the encoder 
system (300) streams compressed video information at a variable rate. 

Before or after the buffer (390), the compressed video information (395) can be 
1 5 channel coded for transmission over a network. The channel coding can apply error 
detection and correction data to the compressed video information (395). 

In addition, the encoder (300) accepts control information for filtering 
operations. The control information may originate fi'om a content author or other 
human operator, and may be provided to the encoder through an encoder setting or 
20 through programmatic control by an application. Or, the control information may 
originate fi'om another source such as a module within the encoder (300) itself The 
control information controls filtering operations such as post-processing de-blocking 
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and/or de-ringing filtering, as described below. The encoder (300) outputs the control 
information at an appropriate syntax level in the compressed video information (395). 

B. Video Decoder 
5 Figure 4 is a block diagram of a general video decoder system (400). The 

decoder system (400) receives information (495) for a compressed sequence of video 
pictures and produces output including a reconstructed picture (405). Particular 
embodiments of video decoders typically use a variation or supplemented version of the 
generalized decoder (400). 

10 The decoder system (400) decompresses predicted pictures and key pictxu-es. 

For the sake of presentation, Figure 4 shows a path for key pictures through the decoder 
system (400) and a path for forward-predicted pictures. Many of the components of the 
decoder system (400) are used for decompressing both key pictures and predicted 
pictures. The exact operations performed by those components can vary depending on 

1 5 the type of information being decompressed. 

A buffer (490) receives the information (495) for the compressed video 
sequence and makes the received information available to the entropy decoder (480). 
The buffer (490) typically receives the information at a rate that is fairly constant over 
time. Alternatively, the buffer (490) receives information at a varying rate. Before or 

20 after the buffer (490), the compressed video information can be channel decoded and 
processed for error detection and correction. 

The entropy decoder (480) entropy decodes entropy-coded quantized data as 
well as entropy-coded side information, typically applying the inverse of the entropy 



KBR/kbr 3382-66954 10/06/03 



-20- 



EXPRESS MAIL LABEL NO. EV 339201236 US 
DATE OF DEPOSIT: October 6. 2003 



encoding performed in the encoder. Entropy decoding techniques include arithmetic 
decoding, differential decoding, Hufl&nan decoding, run length decoding, LZ decoding, 
dictionary decoding, and combinations of the above. The entropy decoder (480) 
frequently uses different decoding techniques for different kinds of information, and can 
5 choose from among multiple code tables within a particular decoding technique. 

If the picture (405) to be reconstructed is a forward-predicted picture, a motion 
compensator (430) applies motion information (415) to a reference picture (425) to 
form a prediction (435) of the picture (405) being reconstructed. For example, the 
motion compensator (430) uses a macroblock motion vector to find a macroblock in the 

10 reference picture (425). A picture store (420) stores previous reconstructed pictures for 
use as reference pictures. Alternatively, a motion compensator applies another type of 
motion compensation. The prediction by the motion compensator (430) is rarely 
perfect, so the decoder (400) also reconstructs prediction residuals. 

An inverse quantizer (470) inverse quantizes entropy-decoded data. In general, 

15 the inverse quantizer (470) applies uniform, scalar inverse quantization to the entropy- 
decoded data with a step-size that varies on a picture-by-picture basis or other basis. 
Alternatively, the inverse quantizer (470) applies another type of inverse quantization to 
the data, for example, a non-uniform, vector, or non-adaptive inverse quantization, or 
directly inverse quantizes spatial domain data in a decoder system that does not use 

20 inverse frequency transformations. 

An inverse frequency transformer (460) converts quantized, frequency domain 
data into spatial domain video information. For block-based video pictures, the inverse 
frequency transformer (460) applies an inverse DCT ["IDCT"] or variant of IDCT to 
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blocks of DCT coefficients, producing pixel data or prediction residual data for key 
pictures or predicted pictures, respectively. Alternatively, the inverse frequency 
transformer (460) applies another conventional inverse frequency transform such as an 
inverse Fourier transform or uses wavelet or subband synthesis. In some embodiments, 
5 the inverse frequency transformer (460) applies an 8x8, 8x4, 4x8, or other size inverse 
frequency transform (e.g., IDCT) to prediction residuals for predicted pictures. 

When the decoder (400) needs a reconstructed picture for subsequent motion 
compensation, the picture store (420) buffers the reconstructed picture for use in the 
motion compensation. In some embodiments, the decoder (400) applies an in-loop de- 

10 blocking filter to the reconstructed picture to adaptively smooth discontinuities at block 
boundaries in the picture, for example, as described in U.S. Patent Application Serial 
Nos. 10/322,383 and 10/623,128. 

The decoder (400) performs post-processing filtering such as de-blocking and/or 
de-ringing filtering. For example, the decoder performs the post-processing filtering as 

15 in the WMV8 system, WMV9 system, or other system described above. 

The decoder (400) receives (as part of the information (495)) control 
information for filtering operations. The control information affects operations such as 
post-processing de-blocking and/or de-ringing filtering, as described below. The 
decoder (400) receives the control information at an appropriate syntax level and passes 

20 the information to the appropriate filtering modules. 
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III. Bitstream-ControUed Post-Processing Filtering 

In some embodiments, a video encoder allows a content author or other human 
operator to control the level of post-processing filtering for a particular sequence, scene, 
fi-ame, or area within a fi-ame. The operator specifies control information, which is put 
5 in the encoded bitstream. A decoder performs the post-processing filtering according to 
the control information. This lets the operator ensure that the post-processing enhances 
video quality when it is used, and that post-processing is disabled when it is not needed. 
For example, the operator controls post-processing filtering to prevent excessive 
blurring in reconstruction of high-definition, high bitrate video. 

10 Figure 5 is a generalized diagram of a system (500) with bitstream-controlled 

post-processing filtering. The details of the components, inputs, and outputs shown in 
Figure 5 vary depending on implementation. 

A video encoder (510) accepts soiu-ce video (505), encodes it, and produces a 
video bitstream (515). For example, the video encoder (510) is an encoder such as the 

15 encoder (300) shown in Figure 3. Altematively, the system (500) includes a different 
video encoder (510). 

In addition to receiving the source video (505), the encoder (510) receives post- 
processing control information (512) that originates fi'om input by a content author or 
other human operator. For example, the author provides the post-processing control 

20 information (5 1 2) directly to the encoder (5 1 0) or adjusts encoder settmgs for the post- 
processing filtering. Or, some other application receives input from the author, and that 
other application passes post-processing control information (512) to the encoder (510). 
Altematively, instead of a human operator specifying the post-processing control 
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information (512), the encoder (510) decides the control information (512) according to 
codec parameters or the results of video encoding. For example, the encoder (510) 
increases filter strength as the compression ratio applied increases (e.g., increasing filter 
strength for larger quantization step size, and vice versa; or, decreasing filter strength 
5 for greater encoded bits/pixels, and vice versa). 

The encoder (510) puts the post-processing control information (512) in the 
video bitstream (515). The encoder (510) formats the post-processing control 
information (512) as fixed length codes (such as 00 for level 0, 01 for level 1, 10 for 
level 2, etc.). Or, the encoder (510) uses a VLC / Huffman table to assign codes (such 

10 as 0 for level 0, 10 for level 1, 1 10 for level 2, etc.), or uses some other type of entropy 
encoding. The encoder (510) puts the control information (512) in a header at the 
appropriate syntax level of the video bitstream (515). For example, control information 
(512) for a picture is put in a picture header for the picture. For an MPEG-2 or MPEG- 
4 bitstream, the location in the header could be the private data section in the picture 

15 header. 

The video bitstream (515) is delivered via a channel (520), for example, by 
transmission as streaming media over a network. A video decoder (530) receives the 
video bitstream (515). The decoder (530) decodes the encoded video data, producing 
decoded video (535). The decoder (530) also retrieves the post-processing control 
20 information (532) (performing any necessary decoding) and passes the control 
information (532) to the post-processing filter (540). 

The post-processing filter (540) uses the control information (532) to apply the 
indicated post-processing filtering to the decoded video (535), producing decoded, post- 
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processed video (545). The post-processing filter (540) is, for example, a de-ringing 
and/or de-blocking filter. 

Figure 6 shows a technique (600) for producing a bitstream with embedded 
control information for post-processing filtering. An encoder such as the encoder (300) 
5 shown in Figure 3 performs the technique (600). 

The encoder receives (610) video to be encoded and also receives (630) control 
information for post-processing filtering. The encoder encodes (620) the video and 
outputs (640) the encoded video and the control information. In one implementation, 
the encoder encodes (620) the video, decodes the video, and presents the results. The 

10 author then decides the appropriate post-processing strength, etc. for the control 
information. The decision-making process for post-processing strength and other 
control information may include actual post-processing in the encoder (following 
decoding of the encoded fi-ame or other portion of the video), in which the encoder 
iterates through or otherwise evaluates different post-processing strengths, etc. until a 

15 decision is reached for the frame or other portion of the video. 

The technique (600) shown in Figure 6 may be repeated during encoding, for 
example, to embed control information on a scene-by-scene or frame-by- frame basis in 
the bitstream. More generally, depending on implementation, stages of the technique 
(600) can be added, split into multiple stages, combined with other stages, rearranged 

20 and/or replaced with like stages. In particular, the timing of the receipt (630) of the 
control information can vary depending on implementation. 
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Figure 7 shows a technique (700) for performing bitstream-controUed post- 
processing filtering. A decoder such as the decoder (400) shown in Figure 4 performs 
the technique (700). 

The decoder receives (710) encoded video and control information for post- 
5 processing filtering. The decoder decodes (720) the video. The decoder then performs 
(730) post-processing filtering according to the control information. The technique 
(700) shown in Figure 7 may be repeated during decoding, for example, to retrieve and 
apply control information on a scene-by-scene or fi:ame-by-fi*ame basis. More 
generally, depending on implementation, stages of the technique (700) can be added, 
10 split into multiple stages, combined with other stages, rearranged and/or replaced with 
Uke stages. 



A. Types of Post-Processing Control Information 

There are several different possibilities for the content of the post-processing 
15 control information. The type of control information uses depends on implementation. 
The simplest type represents an ON/OFF decision for post-processing filtering. 

Another type of control information indicates a post-processing level (i.e., 
strength) of de-blocking, de-ringing, and/or other filtering. Bitstream-controUed post- 
processing is particularly useful when the control information represents the maximum 
20 allowed post-processing level. For example, suppose a higher level indicates stronger 
filtering. If the author indicates post-processing level 3 for a given video fi"ame, the 
decoder may apply post-processing of level 0, 1, 2, or 3 to the given fi^me, but not 4 or 
higher. This approach provides some flexibility. If a software decoder does not have 
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enough CPU cycles available to apply level 3 for the frame, it may only apply level 2, 
etc. At the same time, using a maximum allowed level achieves the main goal - 
ensuring that the video will never be excessively blurred by post-processing. A smart 
author sets the maximum allowed level to 0 (i.e., no post-processing) or a low level 
5 when the decoded video is akeady of high quality, and sets a higher level when the 
decoded video presents more blocking and ringing artifacts. 

Alternatively, instead of maximum allowed levels, the control information 
represents exact levels. Such control infomiation specifies a mandatory level of post- 
processing filtering, which is useful when the author wants to control post-processing 
10 exactly. Or, the control information represents minimum allowed levels. This is useful 
when the author wants to guarantee that at least a minimum level of post-processing 
filtering is applied, for example, for very low bitrate video. 

Still another type of control information represents filter type selections, instead 
of or in addition to filter level information for one particular filter or filters. For 
15 example, value 0 indicates no post-processing, value 1 indicates de-blocking, value 2 
indicates de-ringing, value 3 indicates both de-blocking and de-ringing, etc. 

The control information alternatively includes other and/or additional types of 
information. 



20 B. Syntax Levels for Control Information 

Depending on implementation, control information is specified for a sequence, 
on a scene-by-scene basis within a sequence, on a frame-by-frame basis, on a region- 
by-region basis, or on some other basis. This allows the author to review reconstructed 
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video and adapt the post-processing for a given sequence, scene, fiame, region within a 
frame, etc., depending on the syntax level(s) at which control is enabled. Similarly, the 
bitstream includes syntax elements for control information at the appropriate syntax 
level(s) for sequence, scene, frame, region within a frame, etc. 
5 To specify control information for a region within a frame, the author may 

define an area such as a rectangle or ellipse, for example, and the parameters for the size 
and location of the area are put in the bitstream. For a rectangle, the area is definable by 
the sides (a, b) and top-left comer pixel location (x, y), coded using fixed length or 
variable length codes. The post-processing strength for the area is also put in the 
10 bitstream. Altematively, another syntax is used to specify control information for 
different regions within a frame for post-processing filtering. 



IV. Extensions 

In one or more embodiments, an operator specifies control information for in- 
15 loop filtering for a sequence, on a scene-by-scene basis, on a frame-by-frame basis, on a 
region-by-region basis, or on some other basis. The control information includes levels 
(i.e., strengths) of filters, types of filters (e.g., to select from among multiple available 
filters), and/or other types of information. 

20 Having described and illustrated the principles of our invention with reference to 

various embodiments, it will be recognized that the various embodiments can be 
modified in arrangement and detail without departing from such principles. It should be 
understood that the programs, processes, or methods described herein are not related or 
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limited to any particular type of computing environment, unless indicated otherwise. 
Various types of general purpose or specialized computing environments may be used 
with or perform operations in accordance with the teachings described herein. Elements 
of embodiments shown in software may be implemented in hardware and vice versa. 
5 hi view of the many possible embodiments to which the principles of our 

invention may be applied, we claim as our invention all such embodiments as may 
come within the scope and spirit of the following claims and equivalents thereto. 



